Modern Statistical Computing

10. Reports

David Rossell

Pompeu Fabra University

Reproducing these lecture notes

Load required R packages

library(tidyverse)
library(dygraphs)
library(DT)
library(fivethirtyeight)
library(gapminder)
library(gt)
library(maps)
library(plotly)
library(threejs)
library(tidyquant)
library(shiny)
library(viridis)

Reporting results

R provides useful formats

  • Dashboard: short website displaying your main figures/tables/data

  • htmlwidgets: interactive html documents

  • Shiny apps: applications allowing for more user input

We cover a gentle intro. Further resources

Dashboard

File -> New file -> Quarto document. Select html format

Your document should start like this

---
title: "Diamonds dashboard"
format: html
execute:
  echo: false
---

The echo: false option disables the printing of code (only output is displayed)

To display individual code chunks use #| echo: true


```{r}
#| echo: true
2 * 2
```

You can create multiple tabs as follows

  • Write :::panel-tabset at start and ::: at the end

  • Start a new tab with ## Title. Then put R code


:::panel-tabset

## Scatterplot

```{r}
ggplot(diamonds, aes(carat, price)) +
  geom_point()
```

## Histogram

```{r}
ggplot(diamonds, aes(price)) +
  geom_histogram()
```

:::

The next slide shows the result

ggplot(diamonds, aes(carat, price)) +
  geom_point()

ggplot(diamonds, aes(price)) +
  geom_histogram()

Plot layout

Code chunks have layout attributes to arrange the output

  • #| layout-nrow: number of rows

  • #| layout-nrow: number of columns

  • #| layout: []: rows / columns of different sizes

It may also be useful to define captions

  • #| fig-cap: text for figure caption

  • #| cap-location: where to place to caption (top, bottom or margin)

  • #| fig-subcap: figure sub-captions

Example

2 plots in a single row, with figure captions


```{r}
#| layout-nrow: 1
#| fig-cap: 
#|   - "Histogram of prices"
#|   - "Prices vs carats"
ggplot(diamonds, aes(price)) +
  geom_histogram()
ggplot(diamonds, aes(carat, price)) +
  geom_point()
```

The result is shown in the next slide

Histogram of prices

Prices vs carats

Custom layouts: specify for each row the size of each figure

Example: 2 figures in 1st row of size 0.5 each, 1 figure in 2nd row


```{r}
#| layout: [[0.5,0.5], [1]]
ggplot(diamonds, aes(carat)) +
  geom_histogram()
ggplot(diamonds, aes(price)) +
  geom_histogram()
ggplot(diamonds, aes(carat, price)) +
  geom_point()
```

Displaying tables & data

  • gt in package gt: styling transformations to display a table (column names, alignment etc.)

  • datatable in package DT: display data as html. Provides filtering, pagination, search & sorting

An example by Hadley Wickam: file ../examples/diamonds_dashboard.qmd

htmlwidgets

Some packages to create interactive plots without using any Javascript (if you know Javascript, then more advanced options are possible)

  • Time series plots: dygraphs

  • Make a ggplot interactive: plotly

  • 3D scatter-plots and globes: threejs

  • Data tables: DT

dygraphs

dygraph visualizes times series interactively

  • 1st argument is the data. Either a time series (xts object) or a data.frame

  • Other arguments to set labels etc.

Example. Female and male deaths from lung cancer

lungDeaths= cbind(mdeaths, fdeaths)
lungDeaths[1:5,]
     mdeaths fdeaths
[1,]    2134     901
[2,]    1863     689
[3,]    1877     827
[4,]    1877     677
[5,]    1492     522

Zoom into x-axis regions. Double-click to zoom out

dygraph(lungDeaths)

Some plot options

dygraph(lungDeaths, main = "Deaths from Lung Disease (UK)") |>
  dyOptions(drawPoints = TRUE, pointSize = 3) |>              #show points
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3)) |> #thicker lines
  dyRangeSelector()                                           #add zooming panel

Plotly

Plotly dynamically displays info about elements of a gg-plot. See here for a gallery of what’s possible

Example. Scatter-plot for gapminder data

Save the ggplot into an object, then use ggplotly. Argument tooltip: what aesthetics from your ggplot (not from your original dataset) to show

library(plotly)

gm2007= filter(gapminder, year==2007)

p= ggplot(gm2007, aes(gdpPercap, lifeExp, text=country)) +
  geom_point(aes(colour=continent, size=pop), alpha = 0.7, show.legend = TRUE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  labs(x='GDP per capita', y='Life expectancy')

ggplotly(p, tooltip=c("country","gdpPercap","lifeExp"))

Plotly

Alternatively, you can use plot_ly and layout to create plots directly, as illustrated below.

fig= plot_ly(gm2007, x = ~continent, y = ~lifeExp, split = ~continent, type = 'violin', box = list(visible = T)) 

layout(fig, xaxis = list(title = "Continent"), yaxis = list(title = "Life Expectancy (years)",zeroline = F))

plotly for time series

getSymbols("GOOG", from="2020-01-01", to="2021-12-31") #creates dataset GOOG
[1] "GOOG"
GOOG[1:5,]
           GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Adjusted
2020-01-02   67.0775   68.4070  67.0775    68.3685    28132000       68.3685
2020-01-03   67.3930   68.6250  67.2772    68.0330    23728000       68.0330
2020-01-06   67.5000   69.8250  67.5000    69.7105    34646000       69.7105
2020-01-07   69.8970   70.1495  69.5190    69.6670    30054000       69.6670
2020-01-08   69.6040   70.5790  69.5420    70.2160    30560000       70.2160

Compute stock market value relative to start date

stock= data.frame(GOOG$GOOG.Adjusted)
stock$GOOG.Adjusted= stock$GOOG.Adjusted/stock$GOOG.Adjusted[1]
stock= data.frame(stock,rownames(stock))
colnames(stock)= c('GOOG','date')
fig= plot_ly(stock, type = 'scatter', mode = 'lines') |>
  add_trace(x = ~date, y = ~GOOG, name = 'GOOG') |>
  layout(showlegend = F)
options(warn = -1)

layout(fig, xaxis = list(zerolinecolor = '#ffff', zerolinewidth = 2, gridcolor = 'ffff'), yaxis = list(zerolinecolor = '#ffff', zerolinewidth = 2, gridcolor = 'ffff'), plot_bgcolor='#e5ecf6', width = 900)

plotly for time series

3D scatterplots

sel= c('state_abbrev','avg_hatecrimes_per_100k_fbi','share_vote_trump','gini_index')
hc= hate_crimes[,sel]
names(hc)= c('state','hatecrimes_fbi','votes_trump','gini')
hc= filter(hc, !is.na(hatecrimes_fbi)) 

labs= c("Votes for Trump","Gini","Hate crimes FBI")
scatterplot3js(hc$votes_trump, hc$gini, hc$hatecrimes_fbi, axisLabels=labs)  |>
points3d(hc$votes_trump, hc$gini, hc$hatecrimes_fbi, color="red", pch=hc$state)

globejs

You can plot data on the globe, e.g. arcs

Code
f= read.table("../datasets/flights.txt", header=TRUE) #data flights from package threejs
f= mutate(f, dest=paste(round(f[,3],2),":",round(f[,4],2),sep="")) # approx location as factor

# select destinations with frequency >300
freq= group_by(f, dest) |> 
  summarise(nflights_dest=n())
fsel= left_join(f, freq) |>
  filter(nflights_dest > 300)

globejs(arcs=fsel[,1:4])

globejs

Or bars, e.g. city populations below

Code
data(world.cities, package="maps")
cities <- world.cities[order(world.cities$pop, decreasing=TRUE)[1:1000],]
value  <- 100 * cities$pop / max(cities$pop)
col <- colorRampPalette(c("cyan", "lightgreen"))(10)[floor(10 * value/100) + 1]
globejs(lat=cities$lat, long=cities$long, value=value, color=col, atmosphere=TRUE)

Data tables

datatable from package DT displays a dynamic table that can be searched, filtered, sorted… For a full description see here

datatable(gapminder, rownames=FALSE, caption="Life expectancy", options = list(pageLength=5))

Exercise

Reproduce two examples of your choice from the plotly gallery

Turn in an html with your solution here. Name your file firstname_lastname.html

Maps

R packages

sf (simple feature) and terra define objects to store map-related data.

tmap generates maps. It accepts sf and terra objects, plus other spatial classes

library(sf)      # sf format to store data
library(terra)   # format to store data
library(tmap)    # static and interactive maps

Obtaining spatial data

library(geodata)  #download spatial data (region boundaries)
library(spData)       #pre-defined spatial data
library(spDataLarge)  #pre-defined spatial data

Interactive maps

library(mapview)  #interactive maps
library(leaflet) # for interactive maps

tmap

Basic functions

  • tm_shape: draw map boundaries

  • tm_fill: fill map with colours

  • tm_borders: draw region borders

  • tm_polygons: tm_fill + tm_borders

  • tm_raster: color cells (see tm_rgb)

  • tm_text: add text

  • tm_arrange: show various maps together

  • tmap_mode("plot"), tmap_mode("view"): static / interactive plots

Example. New Zealand

Package spData includes dataset nz, an object of type sf

class(nz)
[1] "sf"         "data.frame"
nz
Simple feature collection with 16 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 10 features:
                Name Island Land_area Population Median_income Sex_ratio
1          Northland  North 12500.561     175500         23400 0.9424532
2           Auckland  North  4941.573    1657200         29600 0.9442858
3            Waikato  North 23900.036     460100         27900 0.9520500
4      Bay of Plenty  North 12071.145     299900         26200 0.9280391
5           Gisborne  North  8385.827      48500         24400 0.9349734
6        Hawke's Bay  North 14137.524     164000         26100 0.9238375
7           Taranaki  North  7254.480     118000         29100 0.9569363
8  Manawatu-Wanganui  North 22220.608     234500         25000 0.9387734
9         Wellington  North  8048.553     513900         32700 0.9335524
10        West Coast  South 23245.456      32400         26900 1.0139072
                             geom
1  MULTIPOLYGON (((1745493 600...
2  MULTIPOLYGON (((1803822 590...
3  MULTIPOLYGON (((1860345 585...
4  MULTIPOLYGON (((2049387 583...
5  MULTIPOLYGON (((2024489 567...
6  MULTIPOLYGON (((2024489 567...
7  MULTIPOLYGON (((1740438 571...
8  MULTIPOLYGON (((1866732 566...
9  MULTIPOLYGON (((1881590 548...
10 MULTIPOLYGON (((1557042 531...

Plotting regions

tm_shape(nz) +  # draw boundaries
  tm_fill() +   # fill map with gray-scale
  tm_borders()  # add borders

#same as tm_shape(nz) + tm_polygons()
tm_shape(nz) +
    tm_polygons() + 
    tm_symbols() #add points at center of each region

Elevations

Load elevation data nz_elev from package spDataLarge

nz_elev = rast(system.file("raster/nz_elev.tif", package = "spDataLarge"))
map_nz = tm_shape(nz) + tm_polygons()
map_nz1 = map_nz +
  tm_shape(nz_elev) + tm_raster(col_alpha = 0.7)
map_nz1

Arrange several maps

tmap_arrange(map_nz, map_nz1)

Color by variable

names(nz) # variables in the data
[1] "Name"          "Island"        "Land_area"     "Population"   
[5] "Median_income" "Sex_ratio"     "geom"         
tm_shape(nz) + tm_fill(fill = "Median_income")

tm_shape(nz) +
  tm_polygons(fill="Median_income", fill.scale=tm_scale(breaks= c(0,30000,40000,50000)))

tm_shape(nz) +
  tm_polygons(fill="Median_income", fill.scale=tm_scale(n= 10))

Getting data

Package geodata downloads geographical boundaries

  • gadm: administrative boundaries for any country

  • world: boundaries for countries in the world

Example. Spain

esp= gadm("spain", level=1, path="~/Downloads/geodata") #level=0 for country, level=1 for first subdivision, etc

tm_shape(esp) + tm_polygons()

esp2= gadm("spain", level=2, path="~/Downloads/geodata")
tm_shape(esp2) + tm_polygons()

#tidyverse functions like filter & select don't work
esp2= esp2[esp2$NAME_1 != 'Islas Canarias', ]
tm_shape(esp2) + tm_polygons()

Interactive map

mapview(esp)

Try also

tmap_mode("view")
tm_shape(esp) + tm_polygons()
tmap_mode("plot")

World bank statistics

wbstats package downloads data from the World Bank

  • Search available indicators with wb_search, wb_cache

  • Download specified indicators with wb_data

library(wbstats)
#indicators= wb_search(pattern = "inequality")
#wbdata= wb_cache()  #data available at the World Bank
#wbdata$indicator
d= wb_data(indicator= "BN.GSR.GNFS.CD", start_date=2022, end_date=2022) #net trade in goods and services US$

Get world boundaries, convert to sf and fix errors in object

w= world(path="~/Downloads/geodata")
wsf= sf::st_as_sf(w)      #convert SpatVector w into sf
wsf= st_make_valid(wsf)   #fix errors

Merge with World bank data by country (column NAME_0 in wsf, country in d)

map1= left_join(wsf, d, by = c("NAME_0" = "country"))
ggplot(map1) + geom_sf(aes(fill = BN.GSR.GNFS.CD)) +
  scale_fill_viridis() + labs(fill = "NET TRADE (US$)") + theme_bw()

Shiny apps

Widgets allow user-based interactivity: all operations occur in your browser

Shiny apps also allow server-based interactions: operations are run by R in a server

To start a Shiny app

  1. Create a directory for the app in your computer

  2. Add a file app.R specifying how the app should look

For a primer on Shiny, see Hadley Wickam’s Mastering Shiny. See also

app.R

A minimal example

library(shiny)

ui= fluidPage(  #specify user interface

    selectInput("color", label="Color", choices= c("Blue","Red","Green")), #dropdown menu
    
    checkboxInput("married", "Are you married?", value=FALSE), #check-box

)

server= function(input, output, session) {  #server calculations
}

shinyApp(ui, server) #run the App

Click the “Run App” button in RStudio (or outside RStudio, call shiny::runApp() with the directory containing app.R)

In server you store all results in output, which is a list. The returned element must be the result of a render function:

  • renderPlot: return a plot (plotOutput in ui)

  • renderPrint: print results as in R console (paired with or verbatimOutput in ui)

  • renderText: return single string (paired with textOuput in ui)

  • renderTable: print a static table (tableOutput in ui). For short tables

  • renderDataTable: print a dynamic table (dataTableOutput in ui). For large tables/data

ui= fluidPage(  #specify user interface

    selectInput("color", label="Color", choices= c("Blue","Red","Green")), #dropdown menu
    
    verbatimTextOutput("tablehomeworld"), #print output$tablehomeworld

)


server= function(input, output, session) {  #server calculations

  output$tablehomeworld= renderPrint({
    data= filter(starwars, skin_color == input$color) #input$color from selectInput
    table(data$homeworld)
  })
  
}

Some useful functions

  • Print text. p("aaa"), strong("bold face text"), helpText("italics text")

  • Headers. titlePanel, h1("aaa"), h2("aaa"), h3("aaa")

  • Line break. br()

  • Update results button. submitButton("Get results")

  • Panel arrangement: sidebarLayout, mainPanel, sidebarPanel

Capturing inputs

animals= c("dog", "cat", "other") #define vector used multiple times below

ui <- fluidPage(
  selectInput("animal", "Favourite animal?", animals, multiple=FALSE), #single choice
  radioButtons("animal", "Favourite animal?", animals), #single choice
  checkboxGroupInput("animal", "Animals you like?", animals) , #multiple choice

  textInput("name", "What's your name?"), #short text box
  textAreaInput("story", "Tell me about yourself", rows = 3)         #larger text box
  
  numericInput("num", "Number one", value = 0, min = 0, max = 100),  #numeric box
  sliderInput("num2", "Number two", value = 50, min = 0, max = 100), #slider
  
  dateInput("dob", "When were you born?"),                 #single date
  dateRangeInput("holiday", "Departure and return date?")  #date range
  
  fileInput("upload", NULL) #input file
)
  • 1st argument: identifier of the variable where input is stored

  • 2nd argument: text to display

  • value: default value (optional)

  • Other arguments: specific on each input box

Deploying Shiny to a server

Easiest option: Shinyapps.io. Free plan allows 5 apps, 25 active hours

After creating an account and setting up the system, you can submit/update the app as follows

library(rsconnect)
rsconnect::deployApp("../examples/shinyapp") #specify local directory where you saved the app

Other options available, such as hosting your own server, see here)

Exercise

Check out ../examples/shinyapp/app.R. Using this as a template, create a Shiny app where

  • The user selects a country in the gapminder data (package gapminder)

  • A scatterplot of life expectancy vs. year is produced using only the data from that country, adding a linear regression line

  • The estimated regression coefficients are reported as separate text output